tmap: How many towns with official bilingual names in Belgium?It seems that we just love confusing foreigners… Imagine wanting to take a train to Mons from Ghent but there only is one driving to Bergen. Or driving south with a GPS telling you to folloz the direcion of Liège, but as long as you are on Flemish territory the direction signs will indicate Luik instead.
Mons/Bergen, Liège/Luik, Ypres/Ieper… those names refer to exactly the same city - one of them is the official French name, the other one the official Dutch one.
Two week ago, I heard again a story from foreigners who got very confused, so I wanted to find out how many towns/cities we have with two official town names, and where they are.
I found everything I needed on this website from the Belgian government. The data is from early 2017.
Starting by loading the packages needed:
#packages for the data exploration
library(tidyverse)
library(readxl)
library(ggplot2)
#packages for the maps
library(sp)
library(tmap)
library(viridisLite)
library(leaflet)
library(BelgiumMaps.StatBel)
Importing the data
#Importing the data
raw_data <- read_excel("TF_SOC_POP_STRUCT_2017_tcm325-283761.xlsx", sheet=1)
The data contains a lot of unneeded administrative data, and I wanted to rename some columns to English.
#Keeping only the variables needed
data <- raw_data %>%
select(contains("MUNTY"), TX_RGN_DESCR_NL, CD_SEX, TX_NATLTY_NL, TX_CIV_STS_NL, CD_AGE, MS_POPULATION)
colnames(data) <- c("REFNIS", "TownNL", "TownFR", "Region", "Sex", "Nationality", "MaritalStatus", "Age", "Population")
#Translating Region names to English
data$Region <- data$Region %>%
str_replace("Vlaams Gewest", "Flanders") %>%
str_replace("Waals Gewest", "Wallonia") %>%
str_replace("Brussels Hoofdstedelijk Gewest", "Brussels agglomeration")
Additionnally, the data does not have one population count but is divided in demographic subsets. A bit of filtering showed me that in my home town, there are less than 30 people with the same characteristics as me (female, unmarried, Belgian, age 34) but that’s not really what interests me. So using dplyr, I created a population table, and immediately added a column to compare Town Names in Flemish and French.
#Creating a dataframe with total population for each town, and adding a column to see whether they have the same name
popdata1 <- data %>%
group_by(TownNL, TownFR, Region, REFNIS) %>%
summarise(population=sum(Population)) %>%
arrange(desc(population)) %>%
mutate(SameName = TownNL==TownFR) %>%
ungroup()
There is an issue though. While browsing through some breakouts, I noticed that some town names are annotated with their district. Beveren for instance is called the same in Flemish or French, but its district got translated.
#Noticing an issue:
popdata1%>%
filter(Region=="Flanders") %>%
filter(!SameName) %>%
slice (11:13)
## # A tibble: 3 x 6
## TownNL TownFR Region REFNIS
## <chr> <chr> <chr> <chr>
## 1 Beveren (Sint-Niklaas) Beveren (Saint-Nicolas) Flanders 46003
## 2 Dendermonde Termonde Flanders 42006
## 3 Vilvoorde Vilvorde Flanders 23088
## # ... with 2 more variables: population <dbl>, SameName <lgl>
To get rid of the districts, I cleaned out any word pattern between brackets, and redid the comparison to find out where town names are different.
#Removing the sectors between brackets
popdata <- popdata1
popdata$TownNL <- str_replace(popdata$TownNL, pattern="\\s\\(.+\\)", replacement="")
popdata$TownFR <- str_replace(popdata$TownFR, pattern="\\s\\(.+\\)", replacement="")
#Reassessing whether the names are the same
popdata <- popdata %>%
mutate(DiffName = TownNL != TownFR) %>%
select(TownNL, TownFR, DiffName, population, Region, REFNIS)
There are 95 towns/cities with two different official names, which is 16% of the total amount of towns. Contrary to what some people assume, it’s more or less similar in both regions: 13% of Flemish towns have an official French name, 16% of Walloon towns have an official Flemish name on top. Only in Brussels, an official bilingual region, as a much higher percentage of ’double name’s.
#How many have exactly the same name?
summary1 <- popdata %>%
summarise(NTowns_DiffName = sum(popdata$DiffName), Prop_DiffName=mean(popdata$DiffName))
knitr::kable(summary1, col.names= c("Number of towns with different name",
"Proportion of towns with different name"))
| Number of towns with different name | Proportion of towns with different name |
|---|---|
| 95 | 0.1612903 |
#by region
summary <- popdata %>%
group_by(Region) %>%
summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName),
Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
knitr::kable(summary)
| Region | NTowns | N_SameName | N_DiffName | Prop_SameName | Prop_DiffName |
|---|---|---|---|---|---|
| Brussels agglomeration | 19 | 6 | 13 | 0.32 | 0.68 |
| Flanders | 308 | 269 | 39 | 0.87 | 0.13 |
| Wallonia | 262 | 219 | 43 | 0.84 | 0.16 |
Using tmap I created two first maps: one that shows the general regions in Belgium, and a second comparative one highlighting just the towns that have two official town names.
#Importing SPdataframe for Belgium
data("BE_ADMIN_MUNTY", package="BelgiumMaps.StatBel")
#Merging my 2017 data with the SPdataframe
mapdata <- merge(BE_ADMIN_MUNTY, popdata, by.x = "CD_MUNTY_REFNIS", by.y = "REFNIS")
#Making a file containing only the towns with different names
popdata_DiffName <- popdata %>%
filter(DiffName==TRUE)
mapdataDiffName <- merge(BE_ADMIN_MUNTY, popdata_DiffName, by.x = "CD_MUNTY_REFNIS", by.y = "REFNIS")
#Creating a colour palette
virpalette <- rev(viridis(3))
#Plot different regions
regionplot<- tm_shape(mapdata) +
tm_fill(col="Region", palette=virpalette,
title = "Regions in Belgium")+
tm_polygons(id="TownNL")+
tm_layout(legend.position = c("left", "bottom"))
#Plot to show those with differnet name by region
nameplot <- tm_shape(mapdataDiffName) +
tm_fill(col="Region", palette=virpalette, id="TownNL",
colorNA = "gray90", textNA="Same name",
title = "Different regional town names",legend.position = c("left", "bottom" ),
popup.vars = c("TownNL","TownFR", "population", "Reason"))+
tm_polygons(id="TownNL", "TownFR")+
tm_layout(legend.position = c("left", "bottom"))
tmap_arrange(regionplot, nameplot)
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).
A few things to notice: there is a slightly higher concentration of towns with two offical town names around the language border, but it doesn’t realy explain the full picture.
In the above table it was obvious that the Brussels region has a much higher share of towns with two offical names: 68% versus the country average of 16%. Given Brussels’ status as bilingual that should not come as a surprise. I was actually more surprised to realize that there are still 6 that only have their former Flemish name, and some of them like “Ganshoren” isn’t really that easy to pronounce.
#Checking the data on Brussels
popdata %>%
filter(Region=="Brussels agglomeration") %>%
summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName),
Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 1 x 5
## NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
## <int> <int> <int> <dbl> <dbl>
## 1 19 6 13 0.32 0.68
#List of names for Brussels
popdata %>%
filter(Region=="Brussels agglomeration") %>%
group_by(DiffName) %>%
arrange(desc(DiffName), desc(population))
## # A tibble: 19 x 6
## # Groups: DiffName [2]
## TownNL TownFR DiffName population
## <chr> <chr> <lgl> <dbl>
## 1 Brussel Bruxelles TRUE 176545
## 2 Schaarbeek Schaerbeek TRUE 133042
## 3 Sint-Jans-Molenbeek Molenbeek-Saint-Jean TRUE 96629
## 4 Elsene Ixelles TRUE 86244
## 5 Ukkel Uccle TRUE 82307
## 6 Vorst Forest TRUE 55746
## 7 Sint-Lambrechts-Woluwe Woluwe-Saint-Lambert TRUE 55216
## 8 Sint-Gillis Saint-Gilles TRUE 50471
## 9 Sint-Pieters-Woluwe Woluwe-Saint-Pierre TRUE 41217
## 10 Oudergem Auderghem TRUE 33313
## 11 Sint-Joost-ten-Node Saint-Josse-ten-Noode TRUE 27115
## 12 Watermaal-Bosvoorde Watermael-Boitsfort TRUE 24871
## 13 Sint-Agatha-Berchem Berchem-Sainte-Agathe TRUE 24701
## 14 Anderlecht Anderlecht FALSE 118241
## 15 Jette Jette FALSE 51933
## 16 Etterbeek Etterbeek FALSE 47414
## 17 Evere Evere FALSE 40394
## 18 Ganshoren Ganshoren FALSE 24596
## 19 Koekelberg Koekelberg FALSE 21609
## # ... with 2 more variables: Region <chr>, REFNIS <chr>
#Adding a column to note down the reason for different names
reason_BXL <- popdata %>%
filter(Region=="Brussels agglomeration") %>%
filter(DiffName) %>%
mutate(Reason = "Brussels")
Cities are generally more important and I would have guessed that most of our cities have two official names. By just looking at the difference in average population between towns that have two names (TRUE) and those who don’t, there clearly is a skew towards higher population town. A quick plot in ggplot confirms this to be true: grey shows all the towns in Belgium according to their population size on a logarithmic scale. I coloured those who have two names in green.
popdata %>%
group_by(DiffName) %>%
summarise(mean=mean(population), median=median(population))
## # A tibble: 2 x 3
## DiffName mean median
## <lgl> <dbl> <dbl>
## 1 FALSE 14744.06 11383
## 2 TRUE 42510.78 24701
#Plotting average town size of small and larger towns
ggplot()+
geom_histogram(data=popdata, aes(x=population), fill="grey", alpha=0.6)+
geom_histogram(data=subset(popdata, DiffName==TRUE), aes(x=population), fill="cadetblue4", alpha=1)+
scale_x_log10()+
labs(x= "Population", y="Number of towns", title="Size of towns with two official names amongst all towns in Belgium")
I took a shortcut to define our cities: the 10% highest populated towns.
#10% largest towns and cities in Belgium
quantile(popdata$population, probs = seq(from = 0, to = 1, by = .1))
## 0% 10% 20% 30% 40% 50% 60% 70%
## 89.0 4372.2 6341.8 8308.4 10268.4 12123.0 14649.6 18473.6
## 80% 90% 100%
## 23259.6 34189.8 520504.0
#Proportion of Cities with different names
popdata %>%
filter(population > 34000) %>%
summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName),
Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 1 x 5
## NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
## <int> <int> <int> <dbl> <dbl>
## 1 60 27 33 0.45 0.55
#Adding a reason column
reason_city <- popdata %>%
filter(population > 34000) %>%
filter(Region != "Brussels agglomeration") %>%
filter(DiffName) %>%
mutate(Reason = "City")
After World War I, the peace treaty of Versailles listed the annexation of 9 German towns into Belgium as compensation. They make up our third language region as German is still their main language today. Given that German and Dutch are both German langauges and have a lot of similarities it would make sense that the Flemish would refer to the German town names, while the French have changed some of them.
#Listing the German communes and the two additional towns with german facilities
germanspeaking <- c("Eupen", "Kelmis", "Lontzen", "Raeren", "Amel", "BĂĽllingen",
"Burg-Reuland", "BĂĽtgenbach", "Sankt Vith", "Malmedy", "Weismes")
#Proportion of Cities with different names
popdata %>%
filter(TownNL %in% germanspeaking) %>%
summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName),
Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 1 x 5
## NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
## <int> <int> <int> <dbl> <dbl>
## 1 11 5 6 0.45 0.55
#German towns with two official names
popdata %>%
filter(TownNL %in% germanspeaking) %>%
filter(DiffName==TRUE) %>%
print(n=nrow(.))
## # A tibble: 6 x 6
## TownNL TownFR DiffName population Region REFNIS
## <chr> <chr> <lgl> <dbl> <chr> <chr>
## 1 Kelmis La Calamine TRUE 10964 Wallonia 63040
## 2 Sankt Vith Saint-Vith TRUE 9661 Wallonia 63067
## 3 Weismes Waimes TRUE 7493 Wallonia 63080
## 4 BĂĽtgenbach Butgenbach TRUE 5583 Wallonia 63013
## 5 Amel Amblève TRUE 5523 Wallonia 63001
## 6 BĂĽllingen Bullange TRUE 5489 Wallonia 63012
#Adding a reason column
reason_german <- popdata %>%
filter(TownNL %in% germanspeaking) %>%
filter(DiffName) %>%
mutate(Reason = "German region")
Always a topic for debate in Belgium: the towns with official language facilities. These are towns that belong to one region but they have some degree of bilingual facilities (it’s complicated!).
#Listing all towns with language facilities
faciliteiten <- c("Bever", "Drogenbos", "Herstappe", "Kraainem", "Linkebeek", "Mesen", "Ronse",
"Sint-Genesius-Rode", "Spiere-Helkijn", "Voeren", "Wemmel", "Wezembeek-Oppem",
"Edingen", "Komen-Waasten", "Moeskroen", "Vloesberg")
#Proportion of Cities with different names
popdata %>%
filter(TownNL %in% faciliteiten) %>%
summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName),
Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 1 x 5
## NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
## <int> <int> <int> <dbl> <dbl>
## 1 16 6 10 0.38 0.62
#Which towns have different names?
popdata %>%
filter(TownNL %in% faciliteiten) %>%
filter(DiffName==TRUE) %>%
print(n=nrow(.))
## # A tibble: 10 x 6
## TownNL TownFR DiffName population Region
## <chr> <chr> <lgl> <dbl> <chr>
## 1 Moeskroen Mouscron TRUE 57773 Wallonia
## 2 Ronse Renaix TRUE 26092 Flanders
## 3 Sint-Genesius-Rode Rhode-Saint-Genèse TRUE 18231 Flanders
## 4 Komen-Waasten Comines-Warneton TRUE 18102 Wallonia
## 5 Edingen Enghien TRUE 13563 Wallonia
## 6 Voeren Fourons TRUE 4129 Flanders
## 7 Vloesberg Flobecq TRUE 3426 Wallonia
## 8 Bever Biévène TRUE 2160 Flanders
## 9 Spiere-Helkijn Espierres-Helchin TRUE 2142 Flanders
## 10 Mesen Messines TRUE 1049 Flanders
## # ... with 1 more variables: REFNIS <chr>
#Adding a reason column
reason_facilities <- popdata %>%
filter(TownNL %in% faciliteiten) %>%
filter(DiffName) %>%
anti_join(reason_city) %>%
mutate(Reason = "Language facilities")
To summarize, there are a few reasons why towns have different official names * They are part of a bilingual region (Brussels) * They are a larger city * They are part of the German region * They have langauge facilities * They are close to the language border
Along the way I added additional reason columns, which I now want to merge into the mapdata:
#Creating a reason column for all other towns with two names
reason_other <- popdata %>%
filter(DiffName) %>%
anti_join(reason_city) %>%
anti_join(reason_BXL) %>%
anti_join(reason_german) %>%
anti_join(reason_facilities) %>%
mutate(Reason = "Other")
#Merging reasons into one dataframe
reason <- bind_rows(reason_BXL, reason_city, reason_german, reason_facilities, reason_other)
#Searching for duplicates before join
reason %>%
group_by(REFNIS) %>%
filter(n() > 1)
## # A tibble: 0 x 7
## # Groups: REFNIS [0]
## # ... with 7 variables: TownNL <chr>, TownFR <chr>, DiffName <lgl>,
## # population <dbl>, Region <chr>, REFNIS <chr>, Reason <chr>
#Joining reasons into the main dataframe
popdata_reason <- left_join(popdata, reason)
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).